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Abstract. The credit scoring risk management is a fast growing field due to 
consumer's credit requests. Credit requests, of new and existing customers, are 
often evaluated by classical discrimination rules based on customers informa- 
tion. However, these kinds of strategies have serious limits and don't take into 
account the characteristics difference between current customers and the future 
ones. The aim of this paper is to measure credit worthiness for non customers 
borrowers and to model potential risk given a heterogeneous population formed 
by borrowers customers of the bank and others who are not. We hold on previous 
works done in generalized gaussian discrimination and transpose them into the 
logistic model to bring out efficient discrimination rules for non customers' sub- 
population. Therefore we obtain several simple models of connection between 
parameters of both logistic models associated respectively to the two subpopu- 
lations. The German credit data set is selected to experiment and to compare 
these models. Experimental results show that the use of links between the two 
subpopulations improve the classification accuracy for the new loan applicants. 



1 Introduction 



The credit risk is one of the major risks that a loans institution has to manage. This risk 
arises when a borrower doesnSt pay his debt in the fixed due. To face up this kind of risk, 
banks' managers have to look for efficient solutions to well distinguish good from bad risk 
applicant. Credit scoring is one of the most successful financial risk management solutions 
developed for lending institutions, this solution has been fundamental in consumer credit man- 
agement since Durand ( I941ji. Authors like |Feldman ( 1997 1, Thomas et al. ( 2002 1 and Saporta 



( [2006) defined the Credit scoring as the process of determining how likely a particular appli- 
cant is default with reimbursement. 

Credit scoring methods are applied in order to classify possible creditors in two classes of 
risk: good and bad ( |Giudici| |2003[ ). These methods use explanatory variables obtained from 
applicant information to estimate his intended performance to pay back loan. A large number 
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of Transfer Learning Using Logistic Regression in Credit Scoring classificati on methods can 



be used in the process of identifying borrowersS behavior as decision trees ( Breiman et al 



halanobis 1936) , logistic regression (Cox, 1970,, Cox and Snell 1989| l . . 



1984 1, neural networks (Mcculloch and Pitts' "1943 i, discriminant analysis (Fisher 1936) Ma- 



Both these techniques can provide good discrimination but the most common used methods 
for building scorecard (i.e. credit models) are discriminant analysis and logistic regression. 



Logistic regression is a more appropriate technique for Credit scoring cases ( Henley and Hand 



1996 1. Fan and Wang ( 1998| l and |Sautory et al. ( 1992| l recommend the use of the binary logistic 



regression in Credit scoring cases, when discriminant analysis application conditions are not 
obtainable. This choice becomes imperative if qualitative variables get involved in the model 
( |Bardos|[200l] i. 

Available information about credit candidate supplies a fundamental element in his credit re- 
quest acceptation, information lack in credit risk valorization is suspected to lead to wrong 
decision making. In this paper, we will focus in Credit scoring evaluation, using logistic re- 
gression technique when the population of interest is characterized by a small size. 

Borrower's behavior is described by a binary target variable denoted Y, value taken by this last 
one supplies a basic element in credits' granting decision, Y = when the borrower presents 
problem and Y = 1 otherwise. Beside this variable, every borrower is also described by a set of 
description variables {Xi,X2, ■ ■ ■ , Xd) informing about the borrower and about his accounts' 
functioning. 

The sample of loans' applicants results from a heterogeneous population formed by borrowers 
customers and others who are not. Here we deal with the problem of discrimination in the 
case of a subpopulations' mixture, where the two subpopulations are respectively: borrowers' 
customers and borrowers' non customers. More precisely, we will focus in non customers 
subpopulation credit worthiness evaluation, assuming that sample size of this subpopulation is 
considered weak. 

Beginning with the hypothesis that population size is one of the most important factors affect- 
ing the classification power of the logistic regression technique, we evaluate future customers 
(i.e non customer) behavior to pay back loan, by looking for efficient solution to the problem 
of non customers small sample size. 

We proceed to investigate how using the information on hand of borrowers customers and non 
customers can be efficient. The first approach, which is generally used by banks, consists in 
using the borrowers customers predictive model, to predict borrowers non customers behavior. 
However, it does not take into account, difference between the two subpopulations. Another 
approach consists in using a learning sample resulting from non customersS subpopulation 
to build their predictive model. However, this second approach needs a learning sample of a 
suitable size, which is not our case. 

Changing the second approach can bring an efficient solution to the problem of learning 
from small size sample. This change consists in using a design sample, drown from another 
population considered slightly different (e.g. customersS subpopulation), this sample will be 
used in models building in place of non customers design sample. The idea of using two 
slightly populations for estimating one population parameters, has been first proposed by BierJ 
[nacki et aL] ( |2002| l. In a multinomial context, |Biernacki et al.| ( p002| l proved that two slightly 
different populations are linked through linear relations. Estimation of nonlabeled sample 
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allocation rules was obtained via estimating the linear relationship parameters, using five con- 
straints models on the Unear relationships. 

This approach proved to be efficient in biological context and many extension of this paper 



w as proposed, including | Biernacki and Jacques ( |2007|l, Bouveyron and JacqueS|(2009 ) as well 
as [Beninel and Biernacki] P005| |2007| |2009| ). [Beninel and Biernacki| ( |2009| ) extended this 



approach to the multinomial logistic discrimination and proposed an additional links' model 
in the case where the two studied subpopulation are two Gaussian ones. The main idea of 
this previous works is that information related to one of the two subpopulations contains some 
information related to the other one. 

The earlier works, have been the exit of the main thoughts of this paper, given previous results 
in the case of Gaussian mixture model and the six presented models by |Beninel and Biernacki[ 
( 2009]l, our task is then, to go in deep in the previous results with more testes and simulations. 



and to add to the former links' models a seventh one. 



2 Logistic Regression Model 
2.1 Classical Logistic Regression 

Logistic regression is a variant of linear regression, which is used when the dependent 
variable is a binary variable and the independent variables are continuous, categorical, or both. 
Logistic regression model supplies a linear function of descriptors as discrimination tool, this 
technique is widely used in Credit scoring applications due to its simplicity and explainability. 
Model form is given by 



'o5(t^)-/3o + ^^x,, (1) 
I- Pi 

where 

- Pi is the posteriori probability, defined as the probability that an individual i have the 
modality 1 for given values taken by descriptors (i.e. P{Yi = l|xi)). 

- Xi — {xl,x.f, . . . , x^) is the vector of observed value taking by description variables. 

- — (/?!, /32, . . . , Pd) is the vector of variables effect. 

- /3o is the intercept. 

This technique serves to estimate the posteriori probability pi, which the value allows to assign 
every borrower to his group membership i.e., {Yi — 1} ij pi is greater than a fixed threshold 
value and {Yi = 0} otherwise. 

2.2 Mixture Logistic Regression 

Let us remind that we deal with the problem of discrimination in case of subpopulations' 
mixture, where the two subpopulations of interests are the subpopulation of borrowers cus- 
tomers and the subpopulation of borrowers non customers, denoted respectively 51 and il*, 
for which we associate the two following posteriori probability p and p* . Our purpose is the 
prediction of the solvency of borrowers' non customers using the information on hand of the 
two subpopulations. 
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Given two learning samples Sa = {{^i,Y^i) ■ i = 1, ...,71} and S\ ~ {(x*,Fj*) : i — 
l,...,n*}, where the pairs (x^jFj) and (x*,yj*) are independent and identically distributed 
(i.i.d.) realizations of the random couples (x, Y) and (x*, Y*), we consider the logistic model 
over O, as given by 



p(y,^i|x,,g)^ (2) 

1 + expP°+P ^* 



and over fi* 



p* = p{y; = 1 |x* , r ) = P , (3) 

1 + expPo +P 

where 

- e ^ {(/3oll/3^) e IR'^+H and 9* ^ {{PqWP*^) e are the sets of all parameters 
to be estimated respectively over ft and $1* . 

- (/3o||/3"^) and (/3q||/3*-^) are the concatenations of the intercept and the vector of vari- 
ables effect over il and ft* . 

The mixture model allows the resolution of various discrimination problems, in our case we 
assume that an experienced rule, to predict on the first subpopulation O is known and we have 
a small learning sample from the second subpopulations fl*. From available data we want to 
get a new allocation rule over fi* . 

According to ,Beninel and Biernacki ( 2009| l links between subpopulations could exist and con- 



sequently, information on fl could provide some information on fJ* . Existence of a link be- 
tween variables vector implies a link between the two scores functions given in ( |2.2[ ) and ( |2.2| i. 
Using acceptable links between the scores functions of the two subpopulations allows to use 
hidden information of samples 5'^ and 5*^ to get the allocations rules over fi*. We look in 
what foUows for these hnks basing on results found in Gaussian case. 



3 Gaussian Case and Links Models 

In order to estimate the score function parameters over ft*, we use the data on hand of the 
two subpopulations. The use of customer subpopulation il data aims to moderate the small size 
of the subpopulation fl* of non customers, by supposing the existence of hidden links between 
the distribution of variables over and that over fi* . 

It's known from |Beninel and Biernacki] ( |2007| l as well as |Bouveyron and Jacques] P009| that 
existence of particular connections between the variables distributions lead to relations be- 
tween the parameters of their respective logistic regression models, consequently our task con- 
sists in finding these links. In this context a preliminary case study was successfully done in 
Gaussian multivariate case Beninel and Biernacki ( 2005 [ ). It is a question here of extending 



the found results in Gaussian case to logistic case, which leads to simple and parsimonious 
linking models between the parameters of logistic classification rules associated respectively 
to the two subpopulations ft and il*. 
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3.1 Gaussian Case: Subpopulations Links 

In Gaussian discrimination, it is crucial to define handled data in terms of two samples: 
a learning sample L and a prediction sample P, resulting respectively from the following 
subpopulations: VL and W . In our case these two subpopulations are different. 

The learning sample L is composed of n pairs (x^, Y;), i = 1, . . . , n where, x^ is a vector 
of M'' representing the numeric characteristics describing the individual i and Yi is his group's 
label. The n pairs (x^, Yi) are supposed to be i.i.d realizations of the random couple (x, Y) 
defined over by the following joint distiibution: 



X|y=fc ^Ndiiik.^k) k = {1, K} 

and (4) 

Y ^MKil,ni,...,nK), 

where Nd{fJ,k, S^) is the Gaussians distributions of dimension d, with an average fik and a 
variance-covariance matrix E^. Mk{1, tti, ttk) is the multinomial distribution of parame- 
ters TTi, ttk, where tt^ is the proportion of the group k in the subpopulation and the param- 
eter K represents modality of the target variable Y. 

The prediction sample P consists of n* individuals, which we know their numeric char- 
acteristics X*, i = 1, n*, assumed the same over L. The n* labels Yj*, . . . , Y*, are to be 
estimated. The n* pairs {'x*,Y*) are supposed to be i.i.d reahzations of the random couple 
(x, Y) defined over fl* by the following joint distribution: 



^iV,(/i*.,E*),fc = {!,..., if} 



and 



(5) 



Then, we try to estimate the n* unknown labels by using resulting information from the 
samples L and P. Our task is then, to identify acceptable relations linking the two subpopula- 
tions. In order to bring to light the existing links between the two subpopulations, we are going 



to adopt the approach proposed by Beninel and Biernacki Beninel and Biernacki ( |2009 1, which 
supposes the existence of an application i/ife ; M"^ ^ M'' linking in law the random variables 
vectors of il and il* . Then 



^*Y'=k ^ '/'fc(X|Y=fe) = [</'fcl(X|y=fc), ...,0fcd(x|y=fc)]^. (6) 

The outcomes resulting from [Beninel and Biernac ki (2009 ) verify that the function 4>k is 
affine, we drive from equation (|6]l the following relations between the variables distributions: 



xfy.^fc ^ AkX\Y=k + Cik, 



(7) 
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where A^, is a diagonal matrix defined over M.'^^'^ and is a vector of M'*. From the previous 
expression we deduct this following links between the parameters of two subpopulations 

^il = Afe^fe + Qffc, (8) 
= AfcSfeAfe. (9) 



3.2 Gaussian Case Extended to Logistic Case 



Anderson ( 1982] l proved the existence of a link between the parameters of the mixture 



Gaussian model and those of corresponding logistic model. Links between the two subpopula- 
tions can be obtained in a stochastic case where, the variables vector x and x* defined over fl 
and ri* are Gaussian, homoscedastic conditionally in the groups and the matrices of common 
variance-covariance are noted in the following way: 



E = El = SsaWS* = £J = E;, (10) 

we obtain the following links between the logistic parameters and the Gaussian one for the 
two subpopulations: 
over ^l, 



Po^^ifJ-^^ i.ii)andf3 = J: ^(/^i-M2) (H) 

and over il*. 



/3o* = S*-V; - M^^S*-V*i) and (3* = E^-^^t - mS) (12) 

replacing the /i^., fc = 1,2 and the E* by their expression given by equations (j8|l, (j9]l and 
limiting to linear relations which, can exist between the two subpopulations parameters, we 
obtain the following expressions for /3q and /?*: 



/3* = c + /3o and /3* = A/3, (13) 
consequently, the scoring function obtained by replacing the parameters /Sq and (3* in equation 



(2.2 1 is given by: 



p(r; = i|x*,^,f?) = ^^^^^— — (14) 

here g = {{c, A) E M''+^} is the set of transition parameters to be estimated. 
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3.3 Links Models 

Estimation of links between and il* subpopulations is done through several logistic 
intermediary sample models of connections, inspired by the Gaussian case previously evoked 
in subsection 3.1. Our purpose in this paper is the estimation and comparison of this models 
listed in the following table 



Models 


Parameters 


Descreptions 


Ml 


c = A^Id 


The score functions are invariable. 


M2 


c = A = \Id 


The score functions of the two subpopulations differ only through the 
scalar parameter A. 


M3 


ceR A = Id 


The score functions of the two subpopulations differ only through the 
scalar parameter /3q . 


M4 


ceR A = Md 


The score function of the two subpopulations differ through the couple 
Wo^ A). 


Mb 


c = AeM''^'' 


The score functions of the two subpopulations differ only through the 
vectoriel parameter /3*. 


M6 


ceR AeR"^""^ 


There is no more stochastic link between the logistic discriminations of 
the two subpopulations. All parameters are free. 



Tab. 1 - Links models 



For each one of the above models, estimation of transition parameters is conditionally 
done to the subpopulation parameters. We add a seventh model noted AI7, which consist 
in introducing as observations, all the borrowers (customer and non customers) and to apply 
a simple logistic regression. This consists in the joined estimation of il parameters and the 
transition parameters. 



4 Empirical Analysis 

4.1 Credit Data Set and Subpopulations Definition 

The adopted herein data set is a real word data set: German credit data, illustrated in Figure 
[T] available from the UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/datasets.htmll 



or see also (Fahrmeir and Tutz 1994 1 for more description. The German Credit scoring data 
set is often used by credit specialists. It cover a sample of 1000 credit consumers where 
700 instances are creditworthy applicants and 300 are not. Each applicant is described by 
a binary target variable Kredit, Kredit = 1 for creditworthy and Kredit — otherwise, 
20 other input variables are assumed to influence this target variable, duration of credits in 
months {Laufzeit), behaviour repayment of other loans (Moral), value of savings or stocks 
(Sparkont), stability in the employment (Beszeit), further running credits iWeitkred) . . . 

In this case study we are interested in the evaluation of the borrowers non customers behav- 
ior to pay back loans, we use the variable Laufkont (balance of current account) to separate 
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Nominal 
fealures 
Numeric 

features 




Fig. 1 - German data set description. 



the available data set in two subpopulations: Laufkont > 1 the customers subpopulation 
composed from 726, Laufkont = 1 the non customers subpopulation composed from 274. 
Afterward, we devine the subpopulation of borrowers non customers into two samples: a learn- 
ing sample 5£ and a test sample S^. The first sample allows to represent the diverse models 
and to bring out affectation rules, the second one allows to verify the reliability of the estab- 
lished models in learning step. 



4.2 Experiments Description 

To obtain a robust estimate of our seven models performance, our simulations involves 
taking 50 random design (of size n E {50, 100, 150, 200}) and test sample splits from the non 
customers subpopulation. For each design the following algorithm is applied to estimate the 
parameters of each model from our seven logistic models. 



Algorithm 1 ESTIM{x, x*,Sl) 9* 



Require: x: Customer design matrix defined over il. 
X*: non customer design matrix defined over fi*. 
SI'. Non customer learning sample, 5'£ G fi*. 

Ensure: 9* : Set of all pai-ameters to be estimated over n*,9* = {(/3q | |/3*'^) e K'^+^j. 
1: Estimate the set of parameters 9, 9 — {{(3q\\(3^) E using a simple logistic regres- 

sion on X. 

2: Estimate the set of transition parameters g, g — {(c||A) E using the learning 

sample S'^ and the design matrix x* . 



3: Replace the parameters in equation ( 13 1 by their values found in Step 1 and Step 2. 
4: return 9*. 



Table II summarizes the different parameters to be estimated by the previous algorithm 
for the seven studied models. Once all parameters are estimated, an estimate of the applicant 
group label is obtained by replacing the parameters by their values in equation ( [T4| . 

Most application of assignment procedures works with the misclassification error rates as 
evaluation criterion, in this case study our choice was a unusual one, so we decided to work 
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with the test error rate. Type // error rate and Type / error rate. The aim was to focus in 
minimizing the number of defauh accepted applicant by minimizing the Type / error. 



Models 


Learning sample 


transition parameters 


Estimated parameters 


Ml 


n 




/35 = /3o and /?*=/? 


M2 


SI 


A 


/3* = /3o and /3* = A/3 


M3 


c 


I3*=c + pa and /3* = /3 


M4 


c and A 


= c + /3o and /3* X/S 


M5 


A 


13* = /3o and /3* = A/3 


M6 


c and A 


/3* = c + /3o and /3* = A/3 


M7 






/3* and /3* 



Tab. 2 - Summary of parameters to be estimated 



4.3 Experimental Results 

The resuhs for the German credit data set were obtained by using the seven models are 
summarized in Tables [3] |4]and[5]respectively. 



Models 


Ml 


M2 


M3 


M4 


M5 


M6 


M7 


n = 50 


0.348 


0.370 


0.348 


0.347 


0.358 


0.361 


0.343 


n = 100 


0.385 


0.362 


0.344 


0.345 


0.347 


0.344 


0.385 


n = 150 


0.356 


0.354 


0.330 


0.332 


0.338 


0.342 


0.354 


n = 200 


0.367 


0.337 


0.308 


0.308 


0.321 


0.315 


0.334 



Tab. 3 - Results summary for predictive credit test error rate with respect to the learning 
sample size 



Models 


Ml 


M2 


M3 


M4 


M5 


M6 


Ml 


n 50 


0.282 


0.312 


0.338 


0.332 


0.385 


0.341 


0.283 


n = 100 


0.226 


0.304 


0.339 


0.311 


0.364 


0.356 


0.284 


n = 150 


0.209 


0.286 


0.296 


0.321 


0.344 


0.338 


0.279 


n = 200 


0.185 


0.283 


0.294 


0.296 


0.305 


0.245 


0.203 



Tab. 4 - Results summary for predictive credit Type II error with respect to the learning 
sample size 



We found no significant differences among models A/3 and M 4 that means that these two 
models achieved almost the same test error rate in Table [3] and almost the same Type // and 
Type / error in Tables |4] and |5] for different training size. It is obvious from Table [3] that 
test error rate decreases proportionally to the learning sample size, this improvement can be 
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Models 


M\ 


M2 


Mi 


MA 


M5 


Af6 


Ml 


n = 50 


0.394 


0.321 


0.284 


0.285 


0.291 


0.301 


0.376 


n = 100 


0.384 


0.301 


0.279 


0.275 


0.281 


0.297 


0.362 


n = 150 


0.384 


0.281 


0.230 


0.233 


0.253 


0.275 


0.316 


n = 200 


0.336 


0.271 


0.220 


0.218 


0.250 


0.273 


0.278 



Tab. 5 - Results summary for predictive credit Type I error with respect to the learning sam- 
ple size 



suitable to the estimate of models' parameters which become more precise with the increase 
of the training data size. Tables |4] and |5] shows that Type // error and Type / proportionally 
decrease to the design sample size, these results prove the importance of the population size in 
classification. 

As shown in Table [3] the test error rate of the two previous models achieved 0.308 which 
is the lowest rate of misclassified instances, according to this first criterion these models are 
the two best classification models. For the remaining models, we remark that models Mb and 
M6 also achieved good results, followed by model M2, the left behind two models generate 
the most raised test error rate, specially model Ml which appears the worst one. 

Test error rate, however measured, is only one aspect of performance, this criterion may not 
be the most precise one, further misclassification rate can be another aspect of performance, 
so each model is evaluated by assessing Type / and // error rate. We remind that the cut-off 
threshold used in this case study is 0.5 for this threshold, all the applicants whose estimated 
probability of non-reliability P{Y = 0) is less than 0.5 are assessed as non-reliable applicants, 
otherwise they are classified as reliable. In Table |4] model Ml and Ml achieved 0.185 and 
0.203 error rate, which are the lowest Type // error rate, in other hand models Mb and M6, 
followed by A/2 have the most raised rate, this kind of error arise when a reliable applicant 
is predicted as non-reliable. Models Mb and MQ are less efficient in the reliable applicants 
prediction. 

Table |5] summarize Type / error for the seven models. A Type / error means taking a non- 
reliable client and predicting him as reliable, this kind of error is more dangerous and more 
costly than the previous one, the model with the lowest rate of Type / error is considered as 
the best model. From Table |5] we remark that models M3 and MA have the lowest rate of Type 
/ error, followed by models Mb, M6 and M2, in other hand models Ad\ and Ml have the 
most raised Type / error rate. It seems that these two previous models have greater difficulty 
in predicting non-reliable clients than reliable ones. 

The previous misclassification rates are obtained when the cut-off is 0.5, however changing 
this threshold might modify the previous results and can allow decider to catch a greater num- 
ber of good or bad applicants. Hand |Hand| ( |200 1 1 in his work proposed the use of graphical 
tools as evaluation criterion, in place of scalar criterion. We use in this paper the ROC (i.e. 
receiver operating characteristic) curve to evaluate our seven models, the ROC curve shows 
how the errors change when the threshold varies, this kind of curve situate positives instances 
against the negatives instances which allow finding the middle ground between specificity and 
sensitivity. 

Figure |2] shows the ROC curve of our models. The X axis of the curve represents models' 
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ROC curves 




Fig. 2 - ROC curves. 



1 — specificity (i.e. Type // error rate) and the Y axis represents models' sensitivity (i.e. 
1— Type / error rate). According to Liu and Schumann Liu and Schumann (20021 a model 
with a ROC curve, which follows the 45° line would be useless. It would classify the same 
proportion of not worthy applicants and worthy cases into the not worthy class at each value 
of the threshold. Figure|2]shows that the seven models are convexes and situated over the first 
bisector, which lead us to affirm that our models are statistically approved and not useless. In 
Figure |2] we remark that models M3 and M4 curves appears considerably higher to the other 
models' curves which confirms our intuition about their performance, models AI 1, A/7 and 
A/2 has the lowest AUC (i.e. air under curve), from the balance between false positive and 
false negative point of view these models are bad. 

The evoked performance measures in this section, served to evaluate the validity and the 
discriminant power of the studied models. From the previous results, we remark that the most 
banks' practiced model Ml seem the least successful model once applied to non customers 
borrowers' data, this confirms the difference between the two studied subpopulations. We also 
remark that models A/5 and A/6 might be a good classifier. However, models A/3 and A/4 
seems to be the more suitable models for the prediction of non customers behavior to pay back 
loan. These last one are the best predictive models because their constant is calculated from 
the non customers learning sample independently of customers sample, what supposes their 
importance in the reliability prediction of the target variable kredit and confirm the existence 
of a certain link between the two subpopulation ft and O*. Model A/5 possesses the most 
raised rate of Type // error, this model is considered as careful but its use can lead to a loss of 
reliable borrowers. 

To be sure of our models performance, we compare in what follows the performance of the 
models A/3 and A/4 with two successful classification techniques: 



SVM ( Vapnik 1995 1 is one of the most outstanding machine learning techniques. The 
use of SVM in financial application has been previously discussed by several works 



(Schebesch and Stecking 2005 Min and Lee 2005) Huang et al. [2007 Wang 2008 



iellotti and Crook 2009| l. There many raisons for choosing SVM (Burges 1998), it re 



quires less prior assumptions about the input data and can perform on small or huge data 
set by doing a nonlinear mapping from an original input space into a high dimensional 
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feature space. 

Decision trees (Breiman et al. 1984 1 was used in credit scoring for the first time by ?. 



DT is a very simple method and can be described as a set of nodes and edges, the root 
node define the first split of the credit-applicants sample. Each internal node split the 
set of instances into two subsets. Each node contains individuals of a single class; the 
operation is repeated until the separation in sub-populations is no more possible. 



n = 200 



Method 


Model 


Test Error 


Type/ 


Type // 


Logistic 


M3 


0.308 


0.220 


0.294 




M4 


0.308 


0.218 


0.296 


SVM 


Radial 


0.215 


0.240 


0.304 




Polynomial 


0.355 


0.344 


0.363 


DT 


IDS 


0.395 


0.222 


0.431 




C4.5 


0.275 


0.258 


0.298 



Tab. 6 - Results summary of error rate for SVM, DT and links model M3 and M4. 



Table |6] summarizes the average error rates using two SVM: radial and polynomial; two 
DT: 1D3 and C4.5 and the previous result of models M3 and A/4 based on a training sample 
of 200 instances. Table |6] shows that for test error rate radial SVM and C4.5 seems the best 
ones followed by A/3 and A/4. Although the performances of the added techniques, model 
A/3 and A/4 yield slightly the lower type / and type // error rate. The results confirm the 
performance of the proposed links model. 



5 Conclusion 

In this paper we have considered the problem of credit worthiness evaluation, for a popu- 
lation of insufficient size. We proposed seven simple logistic submodels combining the clas- 
sification rule on customers subpopulation and the labeled sample from the non customers 
subpopulation. A comparison of the seven models performance was done and the models A/3 
and A/4 was selected as the best classification model for the non customers subpopulation, this 
two models beat the performance of traditional classification model A/1. 
This research would have been able to generate more interesting results if we were able to 
have a non customers' sample of bigger size. We envisage as perspective, to apply logistic 
regression using non-linear links between the two subpopulations. We also can apply a non- 
parametric approach which can seem efficient once the linear models find their limits. 
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Resume 

La gestion du risque de credit est un domaine en forte croissance, en raison du developpe- 
ment des offres. Les demandes de credit des nouveaux et anciens clients sont souvent evaluees 
selon des regies de discrimination classiques basees sur le comportement, en remboursement, 
des anciens clients. Ce genre dSapproche presente des limites et ne tient pas compte de la 
difference de caracteristiques entre les clients et les non-clients. Le but de cette etude est de 
predire le comportement des emprunteurs non-cUents, etant donnee une population heterogene 
formee par les clients de la banque et les autres. Ce travail sSinspire des travaux dSextension 
de ISanalyse discriminante, par transfert de modeles gaussiens et transpose ISidee au modele 
logistique. Nous nous interessons, en particulier, a la mise en place de modeles de liaison, 
simples et parcimonieux, entre parametres des modeles logistiques associes respectivement. 
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aux deux sous-populations. Les differents modeles de transfert ont ete experimentes et com- 
pares, sur la base de donnees consistant en 1000 credits a la consommation dSune banque 
aUemande. 



